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Abstract 



The local search algorithm WSat is one of the most successful algorithms for solving 
the satisfiability (SAT) problem. It is notably effective at solving hard Random 3-SAT 
instances near the so-called 'satisfiability threshold', but still shows a peak in search cost 
near the threshold and large variations in cost over different instances. We make a number 
of significant contributions to the analysis of WSat on high-cost random instances, using 
the recently-introduced concept of the backbone of a SAT instance. The backbone is the set 
of literals which are entailed by an instance. We find that the number of solutions predicts 
the cost well for small-backbone instances but is much less relevant for the large-backbone 
instances which appear near the threshold and dominate in the overconstrained region. 
We show a very strong correlation between search cost and the Hamming distance to the 
nearest solution early in WSat's search. This pattern leads us to introduce a measure of the 
backbone fragility of an instance, which indicates how persistent the backbone is as clauses 
are removed. We propose that high-cost random instances for local search are those with 
very large backbones which are also backbone-fragile. We suggest that the decay in cost 
beyond the satisfiability threshold is due to increasing backbone robustness (the opposite 
of backbone fragility). Our hypothesis makes three correct predictions. First, that the 
backbone robustness of an instance is negatively correlated with the local search cost when 
other factors are controlled for. Second, that backbone-minimal instances (which are 3-SAT 
instances altered so as to be more backbone-fragile) are unusually hard for WSat. Third, 
that the clauses most often unsatisfied during search are those whose deletion has the most 
effect on the backbone. In understanding the pathologies of local search methods, we hope 
to contribute to the development of new and better techniques. 

1. Introduction 

Why do some problem instances require such a high computational cost for algorithms to 
solve? Answering this question will help us to understand the interaction between search 
algorithms and problem instance structure and can potentially suggest principled improve- 
ments, for example the Minimise-Kappa heuristic (Gent, Maclntyre, Prosser, &: Walsh, 
1996; Walsh, 1998). 

In this paper we study the propositional satisfiability problem (SAT). SAT is important 
as it was the first and is perhaps the archetypal NP-complete problem. Furthermore, many 
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AI tasks of practical interest such as constraint satisfaction, planning and timetabling can 
be naturally encoded as SAT instances. 

A SAT instance C is a propositional formula in conjunctive normal form. C is a bag 
of m clauses which represents their conjunction. A clause is a disjunction of literals, which 
are Boolean variables or their negations. The variables constitute a set of n symbols V. An 
assignment is a mapping from V to {true, false}. The decision question for SAT asks whether 
there exists an assignment which makes C true under the standard logical interpretation 
of the connectives. Such an assignment is a solution of the instance. If there is a solution, 
the SAT instance is said to be satisfiable. In this study, assignments where a few clauses 
are unsatisfied are also important. We refer to these as quasi- solutions. &-SAT is the SAT 
problem restricted to clauses containing k literals. Notably, the &-SAT decision problem 
is NP-hard for k > 3 (Cook, 1971). In several NP-hard decision problems, such as 3- 
SAT, certain probabilistic distributions of instances parameterised by a "control parameter" 
exhibit a sharp threshold or "phase transition" in the probability of there being a solution 
(Cheeseman, Kanefsky, Sz Taylor, 1991; Gent et al., 1996; Mitchell, Selman, Sz Levesque, 
1992). There is a critical value of the control parameter such that instances generated 
with the parameter in the region lower than the critical value (the underconstrained region) 
almost always have solutions. Those generated from the overconstrained region where the 
control parameter is higher than the critical value almost always have no solutions. 

In many problem distributions, this threshold is associated with a peak in search cost 
for a wide range of algorithms. Instances generated from the distribution with the control 
parameter near the critical value are hardest and cost decays as we move from this value 
to lower or higher values. This behaviour is of interest to basic AI research. Being devoid 
of any regularities, random instances represent the challenge faced by an algorithm in the 
absence of any assumptions about the problem domain, or once all knowledge about it has 
been exploited in the design of the algorithm or transformation of the problem instance. 

Random &-SAT is a parameterised distribution of &-SAT instances. In Random &-SAT, 
n is fixed and the control parameter is m/n. Varying m/n produces a sharp threshold in the 
probability of satisfiability and an associated cost peak for a range of complete algorithms 
(Crawford Sz Auton, 1996; Larrabee Sz Tsuji, 1992). The cost peak pattern in Random 
&-SAT has been conjectured to extend to all reasonable complete methods by Cook and 
Mitchell, (1997) who also give an overview of analytic and experimental results on the 
average-case hardness of SAT distributions. 

In this paper we study the behaviour of local search on Random &-SAT. The term local 
search encompasses a class of procedures in which a series of assignments are examined with 
the objective of finding a solution. The first assignment is typically selected at random. 
Local search then proceeds by moving from one assignment to another by "flipping" (i.e. 
inverting) the truth value of a single variable. The variable to flip is chosen using a heuristic 
which may include randomness, an element of hill-climbing (for example on the number of 
satisfied clauses) and memory. Usually, local search is incomplete for the SAT decision 
problem: there is no guarantee that if a solution exists, it will be found within any time 
bound. Unlike complete procedures, local search cannot usually prove for certain that no 
solution exists. 

It is a relatively recent discovery (e.g. Selman, Levesque and Mitchell, 1992) that the 
average cost for local search procedures scales much better than that of the best complete 
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procedures at the critical value of m/n in Random 3-SAT. More recent studies, (e.g. Parkes 
and Walser, 1996) have confirmed this in detail. Therefore in any system where completeness 
may be sacrificed, local search procedures have an important role to play, and this is why 
they have generated so much interest in recent years. 

If we restrict ourselves to those instances of the distribution which are satisfiable and 
increase the control parameter, there is a peak in the cost for local search procedures to 
solve the instances near the critical value in several constraint-like problems (Clark, Frank, 
Gent, Maclntyre, Tomov, & Walsh, 1996; Hogg & Williams, 1994). In the underconstrained 
region, the average cost increases with m/n due to the decreasing number of solutions per 
instance (Clark et al., 1996). However, in the overconstrained region, the cost decreases with 
m/n although the number of solutions per instance continues to fall. Several researchers 
have noted this fact with surprise (Clark et al., 1996; Parkes, 1997; Yokoo, 1997) since the 
number of solutions does not change in any special way near the critical value. Why, then, 
should the cost of satisfiable instances peak near the critical value, and then decay? 

Parkes (1997) provided an appealing answer to the first part of this question in his study 
of the backbone of satisfiable Random 3-SAT instances. For satisfiable SAT instances, the 
backbone is the set of literals which are logically entailed by the clauses of the instance 1 . 
Variables which appear in these entailed literals are each forced to take a particular truth 
value in all solutions. Parkes' study demonstrated that in instances from the undercon- 
strained region, only a small fraction of the variables, if any, appear in the backbone. 
However, as the control parameter is increased towards the critical value, a subclass of 
instances which have large backbones, mentioning around 75-95% of the variables, rapidly 
emerges. Soon after the control parameter is increased into the overconstrained region these 
large-backbone instances account for all but a few of the satisfiable instances. Parkes also 
showed that for a fixed value of the control parameter, the cost for the local search proced- 
ure WSat is strongly influenced by the size of the backbone. This suggests that the peak in 
average WSat cost near the critical value as the control parameter is increased may be due 
to the emergence of large-backbone instances at this point. Parkes noted that for any given 
size of backbone, the cost is actually higher for instances from the underconstrained region 
and falls as the control parameter is increased. He also identified this fall as indicative of 
another factor which produces the overall peak in cost. The main aim of this paper is to 
identify the factor responsible for this pattern; why are some instances with a certain size 
of backbone more costly to solve than others? 

The remainder of the paper is organised as follows. In Section 2 we review the details 
of the WSat algorithm and the Random &-SAT distribution and discuss the experimental 
conditions which were used. We also elucidate the patterns in cost which we intend to 
explain and show how the number of solutions and the backbone size interact. In Section 
3 we identify a remarkable pattern in WSat's search behaviour which clearly distinguishes 
high cost from lower cost instances of a certain backbone size. WSat is usually drawn 
early on in the search to quasi-solutions where a few clauses are unsatisfied. On high cost 
instances, these quasi-solutions are distant from the nearest solution, while on lower cost 
instances of equal backbone size, they are less distant. In Section 4 we develop a causal 
hypothesis, postulating a structural property of instances which induces a search space 

1. Here, our use of the term "backbone" follows Monasson, Zecchina, Kirkpatrick, Selman and Troyansky 
(1999a, 1999b) whose definition of the backbone is equivalent to ours for satisfiable instances. 
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structure which in turn causes the observed search behaviour and thus the cost pattern. We 
suggest that instances of a certain backbone size are of high cost when they are backbone- 
fragile, i.e. when the removal of a few clauses at random results in an instance with a 
greatly reduced backbone size. We discuss how this property may be measured and show 
how as the control parameter is increased, instances of a certain backbone size become less 
backb one- fragile . 

A hypothesis is only of true scientific merit if it makes correct predictions. Our hy- 
pothesis made three correct predictions for which we provide experimental evidence. In 
Section 5 we show that the degree to which an instance is backbone-fragile accounts for 
some of the variance in cost when the control parameter and the backbone-size are fixed. 
In Section 6 we consider the generation of instances which are very backbone-fragile. If 
clauses are removed such that the backbone is unaffected, we found that the resulting in- 
stances became progressively more backbone-fragile. Eventually, no more clauses can be 
removed without affecting the backbone and the instance is said to be backbone minimal. 
Our hypothesis correctly predicts that as clauses are removed in this way from Random 
3-SAT instances, the cost becomes considerably higher. In Section 7 we show that the hy- 
pothesis makes a correct prediction relating to the search behaviour: the clauses which are 
most often unsatisfied during search are those whose removal most affects the backbone. In 
Section 8 we relate this study to previous research and give suggestions for further work. 
Finally, Section 9 concludes. 

2. Background 

In this section we discuss the local search algorithm WSat, the measurement of computa- 
tional cost for it and its representativeness of local search algorithms in general. We also 
review the Random &-SAT distribution and the overall cost pattern for WSat on Random 
&-SAT. Finally we look at how backbone size and the number of solutions interact to affect 
the cost. 

2.1 The WSat Algorithm 

The term WSat was first introduced by Selman et al. (1994). It refers to a local search 
architecture which has also been the subject of a number of subsequent empirical studies 
(Hoos, 1999a; McAllester, Selman, & Kautz, 1997; Parkes & Walser, 1996; Parkes, 1997). 
A pseudocode outline of the WSat algorithm is given in Figure 1. An important feature 
of WSat is that, unlike earlier local search algorithms, it chooses an unsatisfied clause 
and then flips a variable appearing in that clause: Select- variable- from- clause must 
return a variable mentioned in clause. This architecture was first seen in the "random walk 
algorithm" due to Papadimitriou (1991). WSat may use different strategies for Select- 
VARIABLE-FROM-CLAUSE. In this study, we used the "SKC" strategy introduced by Selman, 
Kautz and Cohen (1994); we refer to this combination simply as WSat. Pseudocode for 
the SKC strategy is given in Figure 2. 

We follow Hoos (1998) in our approach to measuring the computational cost of SAT 
instances for our local search algorithm WSAT. Rather than run-times, we measure run- 
lengths : the number of flips taken to find a solution. We set the noise level p to 0.55, which 
Hoos found to be approximately optimal on Random 3-SAT. Hoos and Stutzle (1998) showed 
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WSat(C, Max-tries, Max-flips, p) 
for i = 1 to Max-tries 

T := a random assignment 
for j = 1 to Max-flips 

clause := an unsatisfied clause of C, selected at random 
« := SELECT- VARIABLE- FROM- CLAUSE(c/af/se, C, p) 
T :=T with v's value "flipped" 
if T is satisfying 

return T 
end if 
end for 
end for 

return "no satisfying assignment found" 



Figure 1: The WSat local search algorithm 

SELECT- VARIABLE- FROM- CLAUSE(c/af/se, C, p) 
for each variable x mentioned in clause 

breaks[x] := the number of clauses in C which would 
become unsatisfied if x were flipped 

end for 

if there is some variable y from clause such that breaks[y] = 

return such a variable, breaking ties randomly 
else 

with probability 1 — p 

return a variable z from clause 

which minimises break s[z], breaking ties randomly 
with probability p 

return a variable z from clause 
chosen randomly 

end if 



Figure 2: The SKC variable selection strategy 



that run lengths on all but the easiest instances are exponentially distributed for many local 
search variants. This implies that the random "restart" mechanism (the re-randomisation 
of T after Max-flips flips) is not significantly worthwhile. 
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It is not known to date whether, without using restart, WS AT will almost surely (i.e. with 
probability approaching 1) find a solution on satisfiable 3-SAT instances if given unlimited 
flips. If a local search algorithm will eventually find a solution under these conditions, it is 
said to be probabilistically approximately complete (PAC). Hoos (1999a) proved whether 
several local search algorithms were PAC and Culberson and Gent (1999a) proved that 
WSat is PAC for the 2-SAT case. Hoos (1998) observed that his data suggested WSat 
could be PAC. We set Max-tries to 1 and Max-flips infinite on all runs reported in this 
paper. A solution was found in every run, which is further evidence that WSat may be 
PAC. 

Another implication of the exponential distribution of run lengths is that a large number 
of samples must be taken to obtain a good estimate of the mean. Following Hoos, we use 
the median of 1000 WSat runs on each instance as our descriptive statistic representing 
WSat's search cost on that instance. This appears to give a stable estimate of the cost 
(as it is less sensitive to the long tail than the mean) with only a moderate amount of 
computational effort. 

One objection to studying a single algorithm from the local search class is that it may not 
be representative: results obtained for the algorithm may not generalise to other members 
of the class. While we accept this objection, there is evidence that under certain conditions, 
one local search algorithm is actually to a large extent representative of the whole class. 
For example Hoos (1998) found a very high correlation between the computational costs of 
random instances of pairs of different local search algorithms, including WSat. This also 
suggests that there is some algorithm-independent property of these instances which results 
in high cost for this class of algorithms. 

2.2 Random /c-SAT 

We use the well-studied Random &-SAT distribution (Franco & Paull, 1983; Mitchell et al., 
1992) with k = 3. Random &-SAT is a distribution of &-SAT instances, parameterised by 
the ratio of clauses to variables m/n. Let V be the fixed set of Boolean variable symbols of 
size n. To generate an instance from Random &-SAT with m clauses and n variables, each 
clause in C is independently chosen by randomly selecting as its literals k distinct variables 
from V and independently negating each with probability |. There is no guarantee that all 
variables are mentioned in the instance or that it will not contain duplicate clauses. 

As local search cannot solve unsatisfiable instances, we filter these out using a complete 
SAT procedure. In order to control for the effects of the backbone size, we will also need 
to isolate the portion of the satisfiable part of the distribution for which the backbone size 
is of a certain value. This is obtained by calculating the backbone size of each satisfiable 
instance and discarding those whose backbone is not of the required size. We will term this 
controlling the backbone size. Satisfiable instances with certain backbone sizes are rare at 
certain values of m/n. For example when m/n is 4.49, we found that only 1 in about 20,000 
generated instances is satisfiable with a backbone size of 10. Hence generation of instances 
in this way can be somewhat costly in computational terms. This was therefore one of the 
limits on the value of n for which data could be collected. 
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We were primarily interested in the threshold region of the control parameter, where the 
cost peak occurs: the region near the point at which 50% of the instances are satisfiable. 
We looked at the region between 90% and 20% satisfiability. 

2.3 A Pattern in WSat Cost for Random 3-SAT 

In Figure 3 we show the peak in WSat cost which has been mentioned e.g. by Parkes 
(1997). The peak is slightly above the 50% point (4.29) for the median but appears to shift 
down for higher percentiles. A similar pattern was noticed by Hogg and Williams (1994) in 
local search cost for graph colouring. 
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Figure 3: The cost peak for WSat as m/n is varied. At each level of m/n, we generated 
5000 satisfiable instances. We measured per-instance WSat cost for each of these. 
Each line in the plot gives a different point in the cost distribution, e.g. the 90th 
percentile is the difficulty of the 500th most costly instance for WSat. 



Both Parkes (1997) and Yokoo (1997) suggest that the local search cost peak shown for 
WSat in Figure 3 is a result of two competing factors. As m/n is increased the number 
of solutions per instance falls and this causes the onset of high cost. However, the number 
of solutions continues to fall in the overconstrained region but the cost decreases. There 
must therefore be a second factor whose effect outweighs that of the number of solutions in 
the overconstrained region so as to cause the fall in cost. The main aim of this paper is to 
identify this factor. A pattern in WSat cost on Random 3-SAT identified by Parkes (1997) 
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is our starting point. Parkes observed that for a given backbone size and n, the average 
cost falls as m/n is increased. 

Figure 4 shows the fall in WSat cost for n = 100 Random 3-SAT instances. Each 
point in the plot is the median cost of 1000 instances 2 and the length of the bars is the 
interquartile range of instance cost. The fall in cost is an approximately exponential decay 
for a range of m/n near the threshold and for a range of backbone sizes. The rate of decay 
is affected by the backbone size, with the cost of large-backbone instances decaying fastest. 
The length of the error bars in Figure 4 along with the log scale of the cost axis indicates 
that the distribution of per-instance cost is also positively skewed even once backbone size 
is controlled. For example at the point where m/n is 4.11 and backbone size is 0.9n the 
difference between the 75th percentile and the median is about 4000 whereas between the 
median and the 25th percentile it is about half that. The spread of cost is large, particularly 
relative to the effect of the control parameter. We do not think that a significant portion 
of this variance in the cost among instances is due to errors in our estimates of the cost for 
each instance. 
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Figure 4: The effect of varying m/n on cost while backbone size is controlled. 



2. The cost of each instance is denned as the median run length of 1000 runs so each point in Figure 4 is a 
median of medians. 
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2.4 The Number of Solutions when Backbone Size is Controlled 

We studied the effect of the number of solutions on WSat cost. The number of solutions 
was determined using a modified complete procedure. For small-backbone instances, there 
was some evidence that the number of solutions actually increases with m/n, at least in the 
overconstrained region. Figure 5 shows a plot of the number of solutions, with backbone 
size controlled at O.ln. Each point is the median of 1000 instances and the bars show the 
interquartile range. This possible increase in the number of solutions may help to explain 
the fall in cost for small-backbone instances, but it appears to be too weak an effect to 
account for it in full. 
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Figure 5: Number of solutions with n = 100, m/n varied, and backbone size controlled at 
O.ln. 



We studied the relationship between the number of solutions and the WSat cost with 
backbone size controlled at different values. Figure 6 shows a log-log plot of the number of 
solutions against cost, where m/n is 4.29 and backbone size is O.ln. A linear least squares 
regression (Isr) fit is superimposed. Table 1 gives summary data on the log- log scatter plot 
for different backbone sizes through the transition : the gradient and intercept of Isr fits, 
the product-moment correlation r and the rank correlation. 

The number of solutions is strongly and negatively related to the cost for smaller back- 
bone sizes through the transition and the strength of the relationship is fairly constant as 
m/n is varied. We speculate that the strong relationship on these instances arises because 
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m/n 


Backbone 


Intercept 


Gradient 


r 


Rank corr. 




size 


of Isr fit 


of Isr fit 






4.03 


O.ln 


3.8993 


-0.1967 


-0.7808 


-0.7731 




0.5n 


4.1410 


-0.2123 


-0.6761 


-0.6699 




0.9n 


4.2070 


-0.1372 


-0.1307 


-0.1365 


4.11 


O.ln 


3.8727 


-0.1989 


-0.7696 


-0.7669 




0.5n 


4.1551 


-0.2304 


-0.6834 


-0.6855 




0.9n 


4.1387 


-0.1336 


-0.1275 


-0.1291 


4.18 


O.ln 


3.7867 


-0.1911 


-0.7664 


-0.7760 




0.5n 


4.0533 


-0.2180 


-0.6932 


-0.6974 




0.9n 


4.0202 


-0.1146 


-0.1159 


-0.1217 


4.23 


O.ln 


3.7771 


-0.1932 


-0.7829 


-0.7873 




0.5n 


3.9890 


-0.2140 


-0.6729 


-0.6867 




0.9n 


3.9891 


-0.1270 


-0.1317 


-0.1329 


4.29 


O.ln 


3.7309 


-0.1910 


-0.7787 


-0.7844 




0.5n 


3.9169 


-0.2076 


-0.6921 


-0.6941 




0.9n 


3.7836 


-0.0610 


-0.0612 


-0.0534 


4.35 


O.ln 


3.6981 


-0.1896 


-0.8007 


-0.7994 




0.5n 


3.8933 


-0.2133 


-0.6872 


-0.6967 




0.9n 


3.8173 


-0.1018 


-0.1044 


-0.0903 


4.41 


O.ln 


3.6083 


-0.1782 


-0.7784 


-0.7628 




0.5n 


3.8445 


-0.2094 


-0.7024 


-0.7085 




0.9n 


3.7772 


-0.1120 


-0.1179 


-0.1045 


4.49 


O.ln 


3.5483 


-0.1748 


-0.7972 


-0.7932 




0.5n 


3.7577 


-0.2043 


-0.6954 


-0.6991 




0.9n 


3.6228 


-0.0842 


-0.0992 


-0.0783 



Table 1: Data on log- log correlations between number of solutions and cost with n = 100, 
m/n varied and backbone size fixed at different values. 
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m/n = 4.29, backbone size = 0.1 n 
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in 1n 3 
o 10 
o 
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Figure 6: Scatter plot of number of solutions and cost with n = 100, m/n = 4.29 and 
backbone size fixed at O.ln. 



finding the backbone is straightforward and the main difficulty is encountering a solution 
once the backbone has been satisfied. The density of solutions in the region satisfying the 
backbone is then important. For larger backbone sizes, the number of solutions is less 
relevant to the cost. No significant change in the number of solutions for large backbone 
instances was observed as m/n was varied. That the number of solutions and the cost are 
not strongly related for these instances is unsurprising, as the large backbone size implies 
that the solutions lie in a compact cluster and local search's main difficulty is finding this 
cluster (i.e. satisfying the backbone). Therefore we expect that the density of solutions 
within the cluster is not so important. Hoos (1998) observed that the correlation between 
number of solutions and local search cost becomes small in the overconstrained region. This 
can now be explained simply by the fact that the large-backbone instances dominate in this 
region. 

3. Search Behaviour: the Hamming Distance to the Nearest Solution 

In order to suggest the cause of the cost decay for large-backbone instances which was 
observed in Section 2.3, we made a detailed study of WSat's search behaviour, i.e. the 
assignments visited during search. We report on this exploratory part of the research in 
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this section. We explain the somewhat novel search behaviour metrics which were used 
before giving results and our discussion of them. 

3.1 Definitions and Methods 

Assuming a local search algorithm is PAC, in any given run of unlimited length, /j, the 
number of flips taken to find the first assignment where at most b clauses are unsatisfied, is 
well-defined for b > 0. /o is then equal to the run length. 

A particular run of a local search algorithm then consists of a series of assignments 
To,Ti, ...,Tj , where Tj is the assignment visited after i flips have been made. We found that 
on Random 3-SAT with n = 100, an assignment satisfying all but a few clauses was quickly 
found and that during the remainder of the search, few clauses (1 - 10) were unsatisfied. 
As shown by Gent and Walsh (1993) in GSat, there is a rapid hill-climbing phase, which is 
also suggested by Hoos (1998), followed by a long plateau-like phase in which the number 
of unsatisfied clauses is low but constantly changing. In our experiments we used as an 
arbitrary indicator of the length of the hill-climbing phase. Unlike in GSat, in WSat there 
is no well-defined end point for the hill-climbing phase, since short bursts of hill-climbing 
continue to occur for the rest of the search. We think that using /& as the indicator with 
any value of b between 1 and 10 would give similar results. 

Local search proceeds by flipping variable values and so we might expect that the Ham- 
ming distance between the current assignment and the nearest solution may also be of 
interest. The Hamming distance between two assignments hd(T\,T2) is simply the number 
of variables which T\ and Ti assign differently. We studied the Hamming distance between 
the current assignment T and the solution T so i of C for which hd(T,T so i) is minimised. We 
abbreviate this hdns(T,C) (Hamming distance to nearest solution). For any assignment 
T, hdns(T, C) may be calculated by using a complete SAT procedure which is modified so 
that every solution to C is visited and its Hamming distance from T calculated. 

3.2 Results 

In this section, data is based on Random 3-SAT instances with n = 100 and backbone size 
controlled at various values between O.ln and 0.9n. Recall that to control backbone at 
a certain value, we generate satisfiable Random 3-SAT instances as usual and discard all 
those whose backbone is not of the required size. We varied m/n from the point of 90% 
satisfiability (4.03) to the point of 20% satisfiability (4.49). hdns(Tf 5 ,C) is the Hamming 
distance between the first assignment at which no more than 5 clauses are unsatisfied and 
the nearest solution. For each instance we calculated the median value for /s and the mean 
value for hdns(Tf h , C) based on 1000 runs of WSat. In the plots in Figures 7 and 8, each 
point is the median of 1000 instances. 

Figure 7 shows the effect of varying m/n on /s when backbone size is controlled. The 
values for /s are low compared to the cost and the range is very small. So although the 
cost to find a solution varies considerably from instance to instance, quasi-solutions are 
quickly found no matter what the overall cost. However, there are some notable effects of 
backbone size and m/n on f§. As might be expected, on the larger backbone instances, for 
which overall cost is generally higher, WSat takes slightly longer to find a quasi-solution. 
The effect of m/n is unexpected. If backbone size is controlled at 0.5n or more, as m/n is 
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increased WSat takes slightly longer to find a quasi-solution, although simultaneously cost 
is decreasing as we have seen in Figure 4. 
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Figure 7: The effect of varying m/n on f$ while backbone size is controlled. 

Figure 8 shows the effect of varying m/n on hdns(Tf h , C) when the effects of backbone 
size are controlled for. In this plot, the bars give the interquartile range. The spread of 
values for mean hdns(Tf 5 ,C) at each point is also small relative to the effect of varying 
m/n. Again the positive effect of backbone size on hdns(Tf 5 ,C) is as one might expect 
since backbone size affects cost. 

With backbone size controlled, as m/n is increased through the satisfiability threshold, 
mean hdns(Tf 5 , C) decreases linearly for a wide range of backbone values. Hence, although 
a quasi-solution (7/ 6 ) is usually quickly found, on the instances of lower m/n this quasi- 
solution is considerably Hamming- distant from the nearest solution. As m/n is increased, 
while the backbone size is controlled, this effect is gradually lessened. 

We also looked at the relationship between the search behaviour and the cost when 
m/n was fixed and the backbone size was controlled. We found that in this case variance 
in hdns(Tf 5 , C) accounts for most of the cost variance. Figure 9 shows a plot of the mean 
hdns(Tf h , C) against the cost with backbone size controlled at 0.5n and m/n fixed at 4.29. 
An Isr fit is superimposed. The plot suggests hdns(Tf h , C) is linearly related to log of cost. 

Table 2 gives the intercept and gradient for Isr fits and r values with backbone size 
controlled at three values and m/n varied. Variance in hdns(Tf 5 ,C) accounts for most of 
the variance in cost at three different backbone sizes and this is consistent through the 
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Figure 8: The effect of varying m/n on hdns(Tf h , C) while backbone size is controlled. 



threshold. The scatter plots (not shown) and linear Isr fits to the data were similar in 
shape to that of Figure 9 and so are consistent with a linear relationship. The r values are 
greatest for small-backbone instances but the reasons for this are unclear. Possibly, since 
the search is shorter on the small-backbone instances, success follows quickly after and 
so hdns(Tf 5 , C) is a better indicator of the likelihood of finding a solution. 

Figure 8 showed that while backbone size is controlled, hdns(Tf h , C) falls linearly as m/n 
is increased. The gradient of the fall is about —14. Table 2 showed that if backbone size is 
controlled and m/n fixed, hdns(Tf h , C) is linearly related to log of cost, with the gradient 
of the fit being around 0.08. Given that this linear relationship continues to hold with a 
constant gradient as m/n is varied (in fact the gradient decreases slightly) and assuming 
that increasing m/n is not affecting the cost by other means, we would expect a linear 
decrease in log of mean cost with gradient —1.12, which is only slightly steeper than the 
observed decrease in log of median cost shown in Figure 4. 

So the results are consistent with the idea that whatever factor causes the cost to decay 
exponentially as m/n is varied does so largely by causing hdns(Tf 5 ,C) to fall linearly. 

3.3 Discussion 

We have identified a pattern in search behaviour which is strongly related to the pattern 
in cost discussed in Section 2.3. Our interpretation of this pattern is as follows. In each 
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m/n 


Backbone size 


Intercept 


Gradient 


r 






of Isr fit 


of Isr fit 




4.03 


O.ln 


1.0528 


0.0844 


0.9445 




0.5n 


0.6928 


0.0925 


0.8769 




0.9n 


0.7065 


0.0895 


0.7308 


4.11 


O.ln 


1.0166 


0.0868 


0.9511 




0.5n 


0.6315 


0.0955 


0.8852 




0.9n 


0.8158 


0.0867 


0.7196 


4.18 


O.ln 


1.0858 


0.0839 


0.9556 




0.5n 


0.8090 


0.0895 


0.8799 




0.9n 


0.8109 


0.0864 


0.7195 


4.23 


O.ln 


1.1290 


0.0821 


0.9581 




0.5n 


0.8343 


0.0887 


0.8974 




0.9n 


0.7480 


0.0878 


0.7691 


4.29 


O.ln 


1.1289 


0.0826 


0.9550 




0.5n 


1.0032 


0.0828 


0.8935 




0.9n 


0.8382 


0.0856 


0.7579 


4.35 


O.ln 


1.1664 


0.0811 


0.9628 




0.5n 


0.9835 


0.0842 


0.8996 




0.9n 


0.9835 


0.0808 


0.7728 


4.41 


O.ln 


1.2029 


0.0795 


0.9565 




0.5n 


1.0274 


0.0830 


0.9135 




0.9n 


1.1070 


0.0768 


0.7816 


4.49 


O.ln 


1.2458 


0.0777 


0.9661 




0.5n 


1.1472 


0.0787 


0.9197 




0.9n 


1.1930 


0.0742 


0.8086 



Table 2: Data on correlations between hdns(Tf 5 ,C) and logio cost with n = 100 and m/n 
and backbone size fixed at different values. 
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instance the quasi-solutions which WSat visits form interconnected areas of the search space 
such that local search can always move to a solution from them, without often moving to an 
assignment where many clauses are unsatisfied. The evidence for this is simply that WSat 
runs are apparently always successful but visit the assignments where more clauses are 
unsatisfied very infrequently. Prank, Cheeseman and Stutz (1997) also mentioned in their 
analysis of GSat search spaces that in Random 3-SAT, local minima where few clauses 
were unsatisfied can usually be escaped by unsatisfying just one clause. 

We believe that in instances of higher cost this quasi-solution area extends to parts of the 
search space which are Hamming- distant from solutions, whereas in instances of lower cost 
the area is less extensive. The mean Hamming distance between the early quasi-solution 
Tf 6 and the nearest solution is an accurate indicator of how extensive the quasi-solution 
area is. This interpretation suggests why hdns(Tf 5 ,C) is so strongly correlated with cost: 
the extensiveness of the quasi-solution area determines how costly it is to search. It also 
suggests why, on instances of higher cost, quasi-solutions are found slightly more quickly: 
when the quasi-solution area is extensive, from a random starting point a shorter series of 
hill-climbing flips is required to find a quasi-solution. 



250 



Backbone Fragility and the Local Search Cost Peak 



The mean hdns(Tf 5 ,C) decreases linearly as m/n is increased while backbone size is 
controlled. At the same time, cost decays exponentially. We think this is because as m/n 
is increased, the quasi-solution area becomes progressively less extensive. 

4. A Causal Hypothesis 

The pattern in search behaviour from Section 3 and our interpretation of it suggested a 
causal hypothesis to account for the decay in cost discussed in Section 2.3 and hence the 
overall peak. The key to this hypothesis is a property of SAT instances: backbone fragility. 
This property is qualitatively consistent with the above observations. Most importantly, 
although backbone fragility has implications for an instance's search space topology, it is 
a property based on the logical structure of the SAT instance. In this section we motivate 
and define backbone fragility, discuss how it may be measured and show how it relates to 
the patterns reported in Sections 2.3 and 3. 

4.1 Backbone Fragility : Motivation 

Suppose B is a small sub-bag of the clauses of a satisfiable SAT instance C, such that 
there exists a set of quasi-solutions Qb where at most the clauses B are unsatisfied. What 
structural property of C would cause the quasi-solutions to be attractive to WSat? 
We already know that if the backbone of a Random 3-SAT instance is small, its solutions 
are found with little search (Parkes, 1997). The solutions to C — B (C — B denotes C with 
one copy of each member of B removed) are either solutions to C or members of Qb- If 
we assume that the assignments which are attractive to WSat on C are approximately 
the same assignments which are attractive on C — B, then the members of Qb (which are 
solutions of C — B) will be attractive in C when the backbone oiC — B is small, particularly 
if C's backbone is large. Furthermore for any Tg 6 Qb, the number of variables which do 
not appear in the backbone of C — B is an upper bound on /idns(Tg, C), so a large reduction 
in the backbone size allows for high hdns(TB, C). To summarise, if the removal of a certain 
small sub-bag of clauses causes the backbone size to be greatly reduced, we can expect that 
quasi-solutions where only these clauses are unsatisfied will be attractive to WSat and 
possibly Hamming- distant from the nearest solution. 

We are interested in quasi-solutions in general rather than those in Qb- If removing a 
random small set of clauses on average causes a large reduction in the backbone size, we 
say that the instance is backbone-fragile. Where the effect on the backbone is smaller on 
average, the instance is backbone-robust. If a large-backbone instance is backbone- fragile, by 
extension of the above argument we expect that in general quasi-solutions will be attractive 
and they may be Hamming-distant from the nearest solution. Hence this idea is consistent 
with our observations and interpretation in Section 3: backbone fragility approximately 
corresponds to how extensive the quasi-solution area is. 

The idea that backbone fragility is the underlying factor causing the search behaviour 
pattern is appealing for other reasons. For each entailed literal I of C, there must be a 
sub-bag of clauses in C whose conjunction entails I. For any given backbone size, as clauses 
are added, for any given entailed literal / we expect that the extra clauses allow alternative 
combinations of clauses which entail I. Hence after adding clauses whilst controlling the 
backbone size, the random removal of clauses will have less effect on the backbone since 
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the fact that a literal is entailed depends less on the presence of particular sub-bags. As 
clauses are added, we expect that instances will become less backbone-fragile. Given the 
hypothetical relationship between backbone fragility and the search behaviour, this would 
then explain qualitatively why the search behaviour changes as it does when m/n is varied. 
We think that because backbone fragility is a property of the instance's logical structure, its 
study may also lead to further results about complexity issues, but we postpone discussion 
of this until Section 8. 

4.2 The Measurement of Backbone Robustness 

We now define a measure of the backbone robustness of an instance which will allow us to 
test predictions of the hypothesis. We take the instance C and delete clauses at random, 
halting the process when the backbone size is reduced by at least half. At this point we 
record as the result the number of deleted clauses. This constitutes one robustness trial. 
Our metric for backbone robustness is the mean result of all such possible trials, i.e. the 
average number of random deletions of clauses which must be made so as to reduce the 
backbone size by half. 

It is infeasible to compute the results of all possible robustness trials. Therefore, when 
measuring backbone robustness of an instance we estimated it by computing the average of 
a random sample of trials. We used at least 100 robustness trials in each case and in order 
to ensure a reasonably accurate estimate, we continued to sample more robustness trials 
until the standard error was less than 0.05 x the sample mean (in which case our estimate 
of the mean was accurate to within about 10% at the 95% confidence level). With n = 100, 
using satisfiable instances from near the satisfiability threshold whose backbone size was 
controlled at 50, usually less than 250 robustness trials were required for the estimate to 
converge in this way. Even then, backbone robustness was costly to compute. 

There were different possible metrics for backbone fragility /robustness, but we found 
that the metric described above gave the clearest results for our purposes without an unne- 
cessarily complicated definition. Other metrics, such as the reduction in backbone size when 
a random fixed fraction of clauses are removed, may be more suitable in other contexts. 

4.3 The Change in Backbone Robustness as the Control Parameter is Varied 

As discussed in Section 4.1 we expect that if backbone size is controlled, backbone robustness 
increases as m/n is increased. Since our measure of backbone robustness is defined in terms 
of the size of the backbone, it is most useful when comparing instances of equal backbone 
size. 

We found that increasing the control parameter made instances more backbone-robust, 
as expected. Figure 10 shows the effect on backbone robustness of increasing m/n through 
the satisfiability threshold while n = 100 and backbone size is controlled. Each point is the 
median of 1000 instances. 

We note that backbone robustness as defined by our measure is generally higher for 
instances with larger backbones. We think that this is because on the large-backbone 
instances, the backbone must be reduced by a larger number of literals in each fragility trial 
and that this requires more clauses to be removed. 
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Figure 10: Backbone robustness through the satisfiability transition, with backbone size 
fixed at O.ln 0.5n and 0.9n. 



5. A Correct Prediction about Cost Variance 

We may assert that the fall of cost observed with the increase in the control parameter 
is due to the change in some other factor F, as for example Yokoo (1997) has. Such an 
assertion makes an important and testable prediction: that any variation in F when the 
control parameter is fixed accounts for some of the variation in cost. However there may 
be other factors whose influence on the cost is so great as to obscure the effect of F when 
the control parameter is fixed. To best reveal the effect of F, if there is any, the effects of 
some other factors may have to be controlled for. 

Backbone robustness is our proposed factor F. The backbone size is another factor 
which strongly influences the cost. Our result in this section is that when the effects of 
m/n and backbone size are controlled for, i.e. when they are fixed, the effects of backbone 
robustness can be seen quite clearly for large-backbone instances. 

5.1 Correlation Data 

Figure 11 shows a plot of the log cost against the measure of backbone robustness for 
Random 3-SAT instances with n = 100, m/n 4.29 and backbone size controlled at O.ln, 
0.5n and 0.9n. A linear Isr fit is superimposed in each case. Table 3 gives the intercept, 



253 



Singer, Gent & Smaill 



m/n 


Backbone size 


Intercept 


Gradient 


r 


r -95% 


r +95% 


Rank corr. 






ot Isr nt 


ot Isr nt 








coemcient 


A A O 

4.03 


O.ln 


O AO O O 

3.U338 


A AOA 1 

—0.0204 


A A A O O 

—0.1928 


a o rnp 

—0.250b 


A 1 A A A 

— U.14UU 


A 1 AO /I 

—0.1934 




U.on 


9 1C\1K 

o. iviii) 


— U.Uo <U 


—U.o * oU 


fl /1 1 0.1 

— U.4ly 1 


A 999£ 

— U.ozoO 


—U.o ( lo 




L).9n 


A io A a 

4.284b 


A A A 1 n 

— U.U419 


A /I71 1 

— U.4711 


— U.5165 


A A o r 1 

— U.4251 


— U.4699 


A A 1 

4.11 


U.ln 


O l\C OA 

2.9639 


A A 1 O A 

— U.U134 


A 1 /I AA 

— U.149U 


A O AO O 

— U.2U88 


A AO TO 

— U.U873 


A 1 A AO 

— U.14U2 




U.on 


q ac.<7K 
0.00 < 


— U.UoOl 


— U.ooyl 


— U.4000 


— U.J41 / 


A 977A 
— U.o 1 10 




L).9n 


4.zzo < 


A A07A 
— U.U3 iU 


A A K 9 K 

— 0.4535 


A r AA1 

— U.5001 


A A A£ K 

— U.4U65 


a A fifiO 

—0.4662 


4.18 


U.ln 


o no/ 1 r 

2.9365 


A A A A S ' 

— U.U146 


— U.1745 


— U.2356 


A 1 1 A A 

— U.1149 


A 1 f* f* O 

—0.1663 




U.on 


O.0U0 1 


a aoaq 
— U.UoUZ 


A 9Q/1A 
— U.O04U 


A /(070 
— U.4Z ( Z 


A 9900 

— U.oooy 


A 9.790 

— U.o ( oo 




U.9n 


A 1 01 1 


a A99Q 
— U.U338 


— U.53U6 


— U.568 < 


— U.4yzl 


a r A CiCi 

—0.5466 


A O O 

4.23 


A A 

U.ln 


2.9257 


A A 1 f f 

— U.U155 


A A 1 AT 

— U.21U7 


— U.2659 


a 1 r r 9 

— U.1553 


A O 1 A O 

—0.2116 




U.on 


9. c; 1 AO 
0.014Z 


n no9.n 
— U.UZoy 


A 9ft/|9 

— U.0D40 


— U.41U0 


n 9.1 c^/i 
— U.ol04 


— U.040D 




U.9n 


A 1 9 1 9 

4.1313 


— U.U312 


a rnro 

— U.5253 


— U.5645 


a A O 1 O 

— U.4818 


a r A K 7 

— U.o4o < 


A OA 

4.29 


A A 

U.ln 


2.8766 


A A1 O/ 1 

— U.U136 


A 1 O A /I 

— U.1894 


A O/IOO 

— U.2483 


A 1 OAA 

— U.13UU 


A OAT O 

—0.2053 




U.on 


o /lorn 
o.4yUo 


— U.UZZO 


A 9CC9 

— U.ooOo 


A A 9CA 
— U.4O0U 


a qoftt; 
— U.ooyo 


— U.oyyo 




U.9n 


4.Uyo4 


a noon 
— U.U29U 


— U.5325 


—U.o < 21 


— U.4931 


n r /i £7 
—0.546 / 


/i or 

4.35 


a 1 

U.ln 


2.8261 


— U.U1U9 


— U.1671 


A OO V A 

— U.225U 


a 1 1 a r 

— U.11U5 


A 1 70 /I 

—0.1724 




u.on 


9 /I90C; 
0.40Z0 


— u.uiyy 


— U.O f 04 


fl 4999 
— U.4ZZZ 


fl 9.944 
— U.0Z44 


fl ^789 
— U.O i oZ 




U.9n 


3.9939 


-U.U237 


-U.4984 


-U.5394 


-U.4555 


-0.5243 


4.41 


U.ln 


2.7925 


-U.U1UU 


-U.1763 


-U.2321 


-0.1175 


-0.1683 




U.5n 


3.3772 


-U.U172 


-U.3452 


-U.3954 


-U.2919 


-0.3582 




U.9n 


3.9284 


-U.U211 


-U.5152 


-U.5582 


-U.4692 


-0.5270 


4.49 


U.ln 


2.7164 


-U.UU73 


-U.1392 


-U.1926 


-U.U841 


-0.1355 




U.5n 


3.35U6 


-U.U17U 


-U.4U34 


-U.4585 


-U.3516 


-0.4027 




U.9n 


3.872U 


-U.U198 


-U.5549 


-U.5949 


-U.5125 


-0.5604 



Table 3: Data on the correlation between backbone robustness and logio cost with n = 100 
and m/n and backbone size fixed at different values. 



gradient and r values for Isr fits with backbone size controlled at the same three values and 
with m/n varied through the threshold. 

The r values suggest an effect of backbone robustness on cost, particularly for large 
backbone sizes. For smaller backbone sizes, we imagine that finding the backbone is less of 
an issue and so backbone fragility, which hinders this, has less of an effect. For the larger 
backbone sizes, we think the main difficulty for WSat is satisfying the backbone; backbone 
fragility is then important. However, given the somewhat unclear shape of the scatter plots, 
there are several concerns as to the significance of the correlation, which we now address 
using some simple statistical methods. 
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Figure 11: Scatter plot of backbone robustness and cost with n = 100, m/n = 4.29 and 
backbone size fixed at O.ln, 0.5n and 0.9n. 
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5.2 An Artifact of the Distributions of the Variables? 

One concern is that the observed r could also have arisen simply because of the distributions 
of the two variables rather than because of any relationship between them. This is a serious 
concern here as the distributions are unknown. 

The null hypothesis, Hq is that the value of r which results from the distributions of the 
two variables is equal to the observed r. A randomisation method can be used to test Hq. 
See Appendix A for details of this method. For each data set presented, 1000 randomised 
pairings of the data were constructed. In each case, we found that the observed r does 
not fall within the range of the sampling distribution of r for randomised pairings. Hq can 
therefore be rejected at the 99.9% confidence level. 

The r coefficient, given above, can be greatly affected by outliers. Therefore the rank 
correlation coefficient, which is less affected, was also calculated. The rank correlation is 
also given in Table 3. We found that in each case the rank correlation coefficient is not 
considerably different from the r coefficient. This demonstrates that the observed r was not 
greatly affected by outliers. 

5.3 Confidence Intervals for the Correlation 

Given that there is a relationship between the two variables which is not merely an artifact 
of the distributions or of outliers, how accurate is our measurement of r? A bootstrap 
method can be used to obtain bounds on a confidence interval for this statistic. Again, the 
reader is referred to Appendix A for details of this method. Using this method with 1000 
pseudo-samples we obtained lower and upper bounds on the 95% confidence interval for r, 
which are also given in Table 3 as r~ 95% and r +95% respectively. The data implies with 95% 
confidence, the upper bounds on the amount of error in our estimates of r 2 are between 
about 0.02 and 0.05. 

6. A Correct Prediction about Very Backbone-Fragile Instances 

Our hypothesis proposes that high backbone fragility of instances quite accurately represents 
the factor which (via the search behaviour patterns uncovered in Section 3) causes high 
WSat cost for those instances. However, it is plausible that the high backbone fragility is 
a by-product of some unmeasured latent factor and that it is not causally related to the 
cost. 

To help establish the causal link between backbone fragility and cost, we therefore cre- 
ated sets of random SAT instances which had higher backbone fragility than usual Random 
3-SAT instances. This is to some degree following the methodological precedent of Bayardo 
and Schrag (1996), who created random instances which contained small unsatisfiable sub- 
instances but which had few constraints overall. These were often found to be exceptionally 
hard for the complete procedure Ntab. Their experiments thereby helped establish that 
this feature of instance structure was the cause of exceptionally high cost for complete 
procedures. 

We cannot easily set backbone fragility directly, since it is not a generation parameter. 
One manipulation experiment which is possible is the use of an instance generation proced- 
ure which results in instances with a higher backbone fragility. Our hypothesis predicts that 
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instances generated using such a procedure will be harder than Random 3-SAT instances. 
In this section we define such a procedure and test the prediction. It may be that our pro- 
cedure is also manipulating the latent factor. However, since the procedure is specifically 
designed to increase backbone fragility, a correct prediction here still lends credibility to 
our hypothesis. 

6.1 Backbone-minimal Sub-instances 

Suppose we have a SAT instance C and we remove a clause such that the backbone is not 
affected by the removal of the clause. If such clauses are repeatedly removed, eventually 
the instance will be such that no clause can be removed without disturbing the backbone. 
In this case we have a backbone-minimal sub-instance (BMS) of C. More formally, we have 
the following definition: 

Definition A SAT instance C is a BMS of C iff 

• C is a sub-instance of C (i.e. a sub-bag of the clauses) such that C has the same 
backbone as C. 

• for each clause c of C there exists a literal I such that: 

1. C -> I 

2. (C - {c}) A -7 is satisfiable 

i.e. every strict sub-instance of C has a strictly smaller backbone than the back- 
bone of C □ 

BMSs can be seen as satisfiable analogues of the minimal unsatisfiable sub-instances 
(MUSs) of unsatisfiable instances studied by amongst others Culberson and Gent (1999b) 
in the context of graph colouring and Gent and Walsh (1996) and Bayardo and Schrag 
(1996) in satisfiability. An MUS of an instance C is a sub-instance which is unsatisfiable, 
but such that the removal of any one clause from the sub-instance renders it satisfiable. 
Just as all unsatisfiable instances must have an MUS, all satisfiable SAT instances must 
have a BMS. Having a BMS does not depend on having a non-empty backbone - if the 
backbone of the instance is empty, its BMS is the empty sub-instance. An instance can 
have more than one BMS. Different BMSs of an instance may share clauses. One BMS of 
an instance cannot be a strict sub-instance of another. 

Suppose the backbone of a satisfiable instance C is the set of literals {h,fo, ■ ■ ■ , h}- Let 
d be the clause ~^l\ V -1/2 V ... V Then we have the following useful fact: 

Theorem C is a BMS of C iff C A d is an MUS of C A d □ 

A simple proof of the above is given in Appendix B. Due to this fact, methods for studying 
MUSs can be applied to the study of BMSs. We can study the BMSs of a satisfiable 
instance C by finding the backbone of C and then studying the MUSs of C A d: each of 
these corresponds to a BMS of C since d must be present in every MUS of C A d. 

To find a BMS of C we determine the backbone, then find a random MUS of C A d using 
the same MUS-finding method as Gent and Walsh (1996) and remove d from the result. 
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Instances 


Backbone robustness 




10th percentile 


Median 


90th percentile 


Preserve-backbone(C, 0, C) 


8.5845 


12.9700 


20.6300 


Preserve-backbone(C, 5, C) 


7.7374 


12.0317 


19.1700 


Preserve-backbone(C, 10, C) 


7.1977 


11.0851 


17.3500 


Preserve-backbone(C, 20, C) 


6.0690 


9.3913 


14.5351 


Preserve-backbone(C, 40, C) 


4.2622 


6.4899 


9.9900 


Preserve-backbone(C, 80, C) 


2.0745 


2.8661 


3.9851 


BMS 


1.0200 


1.0600 


1.1600 



Table 4: The effect of Preserve-backbone on backbone robustness. 



6.2 Interpolating Between an Instance and one of its BMSs 

Once a BMS C has been established, we can also study the effects of interpolation between 
C and C by removing at random from C some of the clauses which do not appear in 
C . This is equivalent to removing clauses at random such that the backbone is preserved. 
Preserve-backbone(C, m r , C") will denote C with m r clauses, which do not appear in 
the BMS C", removed at random. The resulting instance will have the same backbone as 
C. 

Just as increasing m/n while controlling the backbone size causes backbone robustness 
to increase, we have found that deleting clauses such that the backbone is unaffected causes 
backbone robustness (as measured above) to decrease, as one might expect. 

We used 500 Random 3-SAT instances with n = 100 and m/n = 4.29. For each instance 
we found one BMS. We then used Preserve-backbone to interpolate with m r set at 
various values. Table 4 shows the effect of increasing m r on backbone robustness. The 
BMSs of the threshold instances are so backbone-fragile that the removal of just one clause 
from them is likely to reduce the backbone by a half or more. 

Our hypothesis predicts that as this interpolation from C to C proceeds, the cost 
for local search increases because the backbone robustness decreases. It is conceivable, 
although it would be very surprising, that removing any clauses from random instances 
near the threshold generally makes their cost for local search increase. If this were the 
case, any increase in cost during interpolation towards a BMS could merely be due to the 
removal of clauses per se rather than the removal of clauses whilst preserving the backbone. 
To control for this possibility we also removed clauses according to two other procedures. 
The procedure RANDOM (C, m r ) removes m r clauses from C at random. The procedure 
Reduce-backbone(C, m r ) removes m r clauses such that each time a clause is removed, 
the size of the backbone is reduced. The clause to be removed is chosen randomly from all 
such clauses. This procedure therefore uses the opposite removal criterion to Preserve- 
backbone. If the backbone becomes empty, no further clauses are removed. 

Figure 12 shows the effect on per-instance cost of applying the three clause removal 
procedures to the same set of 500 Random 3-SAT threshold instances. Each plot is the 
median per-instance cost. 
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Figure 12: The effect of the three clause removal procedures on median per-instance cost. 



We observe that removing clauses randomly or such that the backbone is strictly re- 
duced, causes cost to be reduced, so the removal of clauses does not in itself cause higher 
cost. The Reduce-BACKBONE procedure causes a greater initial fall in cost, as the back- 
bone size is reduced more quickly than with Random. However, the cost then stabilises for 
Reduce-backbone because the backbone becomes empty and thereafter no more clauses 
are removed. 

Removing clauses according to Preserve-backbone causes the local search cost to 
increase by an amount approximately exponential in the number of clauses removed. Table 
5 gives more data on this effect and also cost data for BMSs. The interpolation shifts the 
whole distribution up, not just the median. The median cost of the BMSs, which are the 
most backbone-fragile of all the instances, is more than three times that of the 90th cost 
percentile of Random 3-SAT instances. 

The BMSs of these instances had between 254 and 318 clauses. The above results 
therefore demonstrate the existence of instances in the underconstrained region which are 
much harder than the typical instances from near the satisfiability threshold. However since 
these were not obtained by sampling from Random 3-SAT directly, we do not know how 
often they occur. As far as we know, they are vanishingly rare and therefore, in contrast 
to exceptionally hard instances for complete algorithms, it seems unlikely that they affect 
the mean cost. Also, while Gent and Walsh (1996) showed that the exceptionally hard 
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Instances 


Per- instance cost 


10th percentile 


Median 


90th percentile 


Preserve-backbone(C, 0, C) 


517 


1450 


5175 


Preserve-backbone(C, 5, C") 


537 


1515 


5657 


Preserve-backbone(C, 10, C") 


557 


1608 


6009 


Preserve-backbone(C, 20, C") 


570 


1803 


7037 


Preserve-backbone(C, 40, C") 


643 


2295 


10683 


Preserve-backbone(C, 80, C") 


816 


4154 


24313 


BMS 


1556 


16945 


135883 



Table 5: The effect of Preserve-backbone on per-instance cost. 



instances for complete algorithms are hard for a different reason from that of threshold 
instances, BMSs are apparently hard for the same reason - because they are backbone- 
fragile. 

One useful by-product of this section is a means of generating harder test instances for 
local search variants without increasing n. However these instances do require 0(m + n) 
complete searches to generate: 0(n) to determine satisfiability and the backbone and 0(m) 
to reduce to a BMS. 

7. A Correct Prediction about Search Behaviour 

Recall that in the motivating discussion of Section 4.1 it was suggested that the quasi- 
solutions in Qb would be attractive if the backbone of C — B was small. That is to say that 
the clauses of B are more likely to be the set of unsatisfied clauses if the removal of the 
clauses of B has a large effect on the backbone. This part of the hypothesis also makes a 
prediction about search behaviour - that clauses most often unsatisfied by WSat should be 
those whose removal reduces the backbone size most. In this section we show this prediction 
to be correct. 

We looked at individual instances which were cost percentiles from a set of 5000 Random 
&-SAT instances with n = 100 and m/n = 4.29. Per-instance cost was determined as in 
previous sections. For each clause in the instance, we calculated the number of backbone 
literals which were no longer entailed if the clause was removed. This is a simple measure of 
the backbone contribution (be) of the clause - how much the backbone size depends on the 
presence of the clauses. If a clause's backbone contribution is high, it is termed a backbone- 
critical clause. We made 1000 runs of WSat on each instance under the same conditions as 
in previous sections. During search, each time the current assignment changed we recorded 
whether each clause was unsatisfied. The result of averaging the number of times the clause 
was unsatisfied over all runs gives the unsatisf action frequency (uf) of that clause. 

Figure 13 shows a plot of these two quantities for the clauses of the instance whose cost 
was median of 5000 threshold instances. We note from this figure that the clauses whose 
presence contributes the most to the backbone are more often unsatisfied than average 
during WSat search. 
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Figure 13: Scatter plot of unsatisfaction frequency against backbone contribution for the 
clauses of the cost median of 5000 instances, m/n = 4.29, n = 100. 



Table 6 confirms this pattern. Each row of the table gives data for one instance. We 
selected cost percentiles; individual instances of varying degrees of difficulty. For example 
the row labelled '30th' corresponds to the instance whose cost is the 1500th in rank from 
the easiest to the most difficult of the 5000 instances, while the 50th percentile instance 
is the one used to produce Figure 13. The third and fourth columns give the mean and 
standard deviation of the unsatisfaction frequency over all clauses in the instance and the 
last two columns give the same statistics for the sub-bag of the clauses which were most 
backbone-critical (their backbone contribution was in the top 10%). 

Table 7 shows that the converse effect is also present: the clauses which are most often 
unsatisfied (their unsatisfaction frequency is in the top 10%) are more backbone-critical than 
average. Although an effect is quite clear from the means, there are sometimes particularly 
large standard deviations in be values for the most frequently unsatisfied clauses. This 
is because, as can be seen from Figure 13, some clauses are very often unsatisfied even 
though removing them on their own does not affect the backbone size at all. We have 
found in other experiments that the removal of these clauses along with other small random 
bags of clauses does on average reduce the backbone size considerably. The large standard 
deviations therefore arise because the true backbone contribution of these clauses is not 
apparent when using this simple measure. 
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Cost 


Backbone 


All clauses 


Most backbone- 


Percentile 


size 






critical clauses 






uf mean 


uf std. dev. 


uf mean 


uf std. dev. 


10th 


16 


11.3430 


8.0704 


20.8703 


8.8730 


20th 


11 


13.1079 


10.9596 


30.1817 


16.8730 


30th 


13 


21.0207 


16.3680 


41.4660 


21.1142 


40th 


36 


22.9825 


21.3118 


56.6841 


27.4660 


50th 


48 


29.5615 


25.9275 


72.0704 


38.7779 


60th 


25 


36.2940 


35.6327 


96.1664 


54.3081 


70th 


63 


52.4198 


48.1078 


119.7691 


66.8187 


80th 


70 


92.2623 


87.7827 


167.3428 


149.8058 


90th 


93 


108.3124 


127.1968 


306.7200 


198.6933 



Table 6: Unsatisfaction frequencies of clauses in different cost percentile instances. 



Cost 
Percentile 


Backbone 
size 


All clauses 


Most often 
unsatisfied clauses 


be mean 


be std. dev. 


be mean 


be std. dev. 


10th 


16 


0.5921 


1.2358 


2.0909 


1.8529 


20th 


11 


0.4848 


1.0380 


1.7727 


1.9632 


30th 


13 


0.3963 


1.2405 


1.8409 


2.4490 


40th 


36 


1.8089 


4.2411 


8.3182 


6.4620 


50th 


48 


1.0629 


3.2781 


6.3182 


6.6043 


60th 


25 


1.3800 


3.4920 


7.7500 


5.7794 


70th 


63 


3.3916 


8.8630 


14.9091 


15.8126 


80th 


70 


0.6946 


3.4577 


2.5909 


7.8602 


90th 


93 


3.0653 


10.0376 


16.9318 


20.6436 



Table 7: Backbone contributions of clauses in different cost percentile instances. 
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For instances of different costs at the satisfiability threshold, the clauses which are most 
likely to be unsatisfied during search have a higher backbone contribution than average. 
Conversely, the clauses which have the largest backbone contribution are more likely to be 
unsatisfied during search. This section therefore demonstrates that as well as explaining 
differences in cost between instances, the backbone fragility hypothesis can also explain 
differences in the difficulty of satisfying particular clauses during search. 

8. Related and Further Work 

Clark et al. (1996) showed that the number of solutions is correlated with search cost for 
a number of local search algorithms on random instances of different constraint problems, 
including Random 3-SAT. The pattern was confirmed by Hoos (1998) using an improved 
methodology. Clark et a/.'s work was the first step towards understanding the variance in 
cost when the number of constraints is fixed. We have followed their approach both by 
looking at the number of solutions and by using linear regression to estimate strengths of 
relationships between factors. 

Schrag and Crawford (1996) made an early empirical study of the clauses (including 
literals) which were entailed by Random 3-SAT instances. Parkes (1997), whose study is 
also discussed in Section 1, looked in detail at backbone size in Random 3-SAT and its effect 
on local search cost. He also linked the position of the cost peak to that of the satisfiability 
threshold by the emergence of large-backbone instances which occurs at that point. Parkes 
also identified the fall in WSat cost for instances of a given backbone size. This was 
therefore the basis for our study. Parkes conjectured that the presence of a "failed cluster" 
may be the cause of high WSat cost for some large-backbone Random 3-SAT instances. 
According to this hypothesis, the addition of a single clause could remove a group of solutions 
which is Hamming distant from the remaining solutions, reducing the size of the backbone 
dramatically. Such a clause would then have a large backbone contribution. Therefore our 
explanation for the general high cost of the threshold region has certain features in common 
with Parkes' conjecture. In particular we agree that it is the presence of clauses with a 
large backbone contribution which causes high cost. This is especially demonstrated by our 
results from Section 7. 

Frank et al. (1997) studied in detail the topology of the GSat search space induced by 
different classes of random SAT instances. Their study discussed the implications of search 
space structure for future algorithms, as well as the effects of these structures on algorithms 
such as GSat. They also noted that some local search algorithms such as WSat may be 
blind to the structures they studied because they search in different ways to GSat. 

Yokoo (1997) also addressed the question of why there is a cost peak for local search 
as m/n is increased. The approach was to analyse the whole search space of small sat- 
isfiable random instances. While in this study, we have only examined SAT, Yokoo also 
showed his results generalised to the colourability problem. Yokoo used a deterministic 
hill-climbing algorithm. He studied the number of assignments from which a solution is 
reachable (solution-reachable assignments) via the algorithm's deterministic moves, which 
largely determines the cost for the algorithm. 

We followed Yokoo in looking for a factor competing with the number of solutions whose 
effect on cost changes as m/n is increased. The factor which Yokoo proposed as the cause 
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of the overall fall in cost was the decrease in the number of local minima - assignments 
from which no local move decreased the number of unsatisfied clauses. The decrease in 
this number was demonstrated as m/n is increased. The decrease was attributed to the 
decreasing size of "basins" (interconnected regions of local minima with the same number 
of unsatisfied clauses). Yokoo claimed (p. 363) that: 

"adding constraints [...] makes the [instance] easier by decreasing the number 
of local minima" . 

However, we do not think it is clear a priori what the relationship between the number 
of local minima and the cost is in a given instance and Yokoo did not study it sufficiently 
to convince us of his explanation. In contrast with Yokoo, we have studied in detail the 
relationship between the backbone fragility of instances and WSat's cost on these instances 
and confirmed it by testing predictions of our hypothesis. Also, we studied instance proper- 
ties that related to the logical structure of the clauses rather than the search space topology 
which was induced as we think this has more potential to generalise across algorithms and 
even to address complexity issues, as we explain towards the end of this section. 

Hoos (1998) also analysed the search spaces of SAT instances in relation to local search 
cost by looking at two new measures of the induced objective function which he defined, 
including one based on local minima. Although via these measures, Hoos was not able to 
account for the Random 3-SAT cost peak, he found that the features were correlated with 
cost for some SAT encodings of other problems and has also shown (Hoos, 1999b) that his 
measures can help distinguish between alternative encodings of the same problem. 

How does the pattern we have uncovered fit in to other work on what makes instances 
require a high cost to solve? Gent and Walsh (1996) looked at the probability that an unsat- 
isfiable SAT instance became satisfiable if a fixed number of clauses are removed at random. 
The unsatisfiable instances which had the highest computational cost for a complete pro- 
cedure were found to be those which were unsatisfiability-fragile - their unsatisfiability was 
sensitive to the random removal of clauses. It may therefore be that the fragility of an in- 
stance's unsatisfiability or backbone size is the cause of high computational cost both in the 
context of complete procedures and incomplete local search, which would be an interesting 
link between the two algorithm classes. This link may form the basis of a possible explan- 
ation of the reasons why threshold Random 3-SAT instances may be universally hard in 
the average case, as opposed to merely costly for some class of algorithms. Recent work by 
Monasson et al. (1999a, 1999b) has suggested that parameterised distributions of instances 
which are hard in the average case, e.g. Random 3-SAT, exhibit a discontinuity in the 
backbone size 3 as the control parameter is varied, whereas in polynomial time average-case 
distributions, such as Random 2-SAT, the backbone size changes smoothly. They propose 
that the complexity of the distribution is linked to the presence of this discontinuity. We 
conjecture that this may be because in the asymptotic limit, instances which are backbone- 
or unsatisfiability-fragile only persist as n is increased where there is such a discontinuity. 
This line of research may therefore establish a testable causal mechanism for this pattern, 
showing how the properties of the instance distributions affect algorithm performance. 

It would be interesting to compare backbone fragility in different random distributions 
of 3-SAT instances, such as those introduced by Bayardo and Schrag (1996) and by Iwama, 

3. Monasson et aZ.'s definition of the backbone also extends to unsatisfiable instances. 
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Miyano and Asahiro (1996) to see whether differences in local search cost could be explained. 
A method which generates satisfiable instances which are quickly solved by local search is 
analysed by Koutsoupias and Papadimitriou (1992) and Gent (1998). Random clauses are 
added to the formula as in Random 3-SAT but only if they do not conflict with a certain 
solution which is set in advance. We conjecture that overconstrained examples of these are 
quickly solved by local search because they are very backbone-robust. 

An interesting possibility mentioned by Hoos and Stiitzle (1998) suggested by the expo- 
nential run length distribution, was that local search is equivalent to random generate-and- 
test in a drastically reduced search space. We conjecture that this reduced search space 
corresponds to the quasi-solution area. Measurements of hdns(TB,C) for quasi-solutions 
Tb may therefore be indicative of the extensiveness of this reduced search space, especially 
since this metric is linearly correlated with log cost. Further experimentation in this vein 
may therefore reveal more about the topology of the reduced search space which could in 
turn lead to better local search algorithms designed to exploit this knowledge. 

Finally, we should emphasise that the notions of backbone and backbone-fragility are 
equally applicable to non-random SAT instances. In future we may be able to confirm that 
the results we have shown for random SAT instances apply equally to benchmark and real- 
world SAT instances. However, one caveat here is that entailed literals may be uncommon 
in these instances and we may need to study the fragility of other sets of entailed formulas. 

9. Conclusion 

We have reconsidered the question of why cost for local search peaks near the Random 
3-SAT satisfiability threshold. The overall pattern is one of two competing factors. The 
cause of the onset of high cost as the control parameter is increased has been previously 
established as the decreasing number of solutions. We have proposed that the cause of the 
subsequent fall in cost is falling backbone fragility. 

We found a striking pattern in the search behaviour of the local search algorithm WSat. 
For instances of a given backbone size, in the underconstrained region of the control para- 
meter, WSat is attracted early on to quasi-solutions which are Hamming-distant from the 
nearest solution. This distance is also very strongly related to search cost. As the control 
parameter is increased, the distance decreases. We suggested backbone fragility was the 
cause of this pattern. 

We defined a measure of backbone robustness. Backbone-fragile instances have low 
robustness. We were then able to test predictions of the hypothesis that the fall in backbone 
fragility is the cause of the overall decay in cost as the control parameter is increased. We 
found that the hypothesis made three correct predictions. Firstly that the degree to which 
an instance is backbone-fragile is correlated with the cost when the effects of other factors 
are controlled for. Secondly, that when Random 3-SAT instances are altered so as to be 
more backbone-fragile (by removing clauses without disrupting the backbone) their cost 
increases. Thirdly, that the clauses most often unsatisfied during search are those whose 
deletion has most effect on the backbone. 

We now summarise our interpretation of the evidence. In the underconstrained region, 
instances with small backbones are predominant. In this region, the rapid hill-climbing 
phase typically results in an assignment which is close to the nearest solution (and probably 
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satisfies the backbone). Since finding the small backbone is largely accomplished by hill- 
climbing, typical cost for WSat is low in this region and variance in cost is due to variance 
in the density of solutions in the region of the search space where the backbone is satisfied. 

In the threshold region, large-backbone instances quickly appear in large quantities. 
For large-backbone instances, the main difficulty for local search is to identify the backbone 
rather than to find a solution once the backbone has been identified. The identification of a 
large backbone may be accomplished by the rapid hill-climbing phase to a greater or lesser 
extent. We think that this extent is determined by the backbone fragility of the instance. 
If a large-backbone instance is backbone-fragile the hill-climbing phase is ineffective and 
results in an assignment which is Hamming- distant from the nearest solution (probably 
implying that much of the backbone has not been identified). Then a costly plateau search 
is required to find a solution. Hence when the rare large-backbone instances do occur in the 
underconstrained region, they are extremely costly to solve because of their high backbone 
fragility. 

If a large-backbone instance is more backbone robust, the rapid hill-climbing phase is 
more effective in determining the backbone and the plateau phase is shorter. So overall 
the instance is less costly for WSat to solve. Hence for large-backbone instances, since 
backbone fragility increases as we add clauses, cost decreases. In the overconstrained region, 
large backbone instances are dominant and so backbone fragility becomes the main factor 
determining cost. Hence cost decreases in this region. Our hypothesis proposes the following 
explanation for the cost peak: Typical cost peaks in the threshold region because of the 
appearance of many large-backbone instances which are still moderately backbone-fragile, 
followed by the increasing backbone robustness of these instances. 
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Appendix A: Randomisation and Bootstrap Tests 

We summarise the methods as used in this context. Further explanation of these methods 
is given in Cohen (1995). 

A.l Randomisation for Estimating the Correlation Coefficient due to the 
Distributions of the Variables 

Randomisation can be used to estimate the correlation coefficient between the two variables 
which results simply from their distributions rather than from any relationship. We start 
with the two vectors of data x = (x\,X2, ■ ■ ■ ,xn) and y = {yi,y2, ■ ■ ■ ,Vn)- If the correlation 
coefficient is merely due to the distributions of x and y, then it is not dependent on any 
particular x-i being paired with y^. Therefore to calculate the correlation coefficient resulting 
merely from the distributions we pair the x and y data randomly. 

We construct K randomisations. Each randomisation consists of a vector y', which is 
simply a random permutation of y. For each randomisation, we calculate the correlation 
coefficient between x and y' - note that each value xi is now paired with a random value 
from y. These randomised correlation coefficients give us an estimate of the correlation 
coefficients resulting from the distributions of the variables. If K is large enough, we will 
have an accurate estimate which can be compared with the correlation coefficient in the 
observed data. 

A. 2 Bootstrap Estimation of Confidence Intervals for Correlation Coefficients 

We have an original sample ((x\,yi), (£2, 2/2), ■■■(%, Un)} of TV pairs. A pseudo-sample from 
the original also consists of N pairs. The j th pair in the pseudo-sample (x^y^) = (x q ,y q ) 
where q is a random number between 1 and N. Each pair in the pseudo-sample is chosen 
independently i.e. pairs are sampled from the original with replacement. We assume that 
our original sample of pairs of data is representative of the whole population of such pairs. 
Given this, composing pseudo-samples is just like sampling from the whole population. 
Therefore by measuring the correlation coefficient of many pseudo-samples, we can study 
what the correlation coefficient would have looked like had we taken many sets of samples 
from the whole population. From the distribution of the correlation coefficient among many 
pseudo-samples (the bootstrap sampling distribution) we can infer bounds on the confidence 
interval for the observed correlation coefficients. 

Many pseudo-samples are taken, and the correlation coefficient is calculated for each of 
the pseudo-samples. This gives the bootstrap sampling distribution of the correlation coef- 
ficient. The 97.5 th percentile of this distribution is an upper bound on the 95% confidence 
interval for the correlation coefficient, and the 2.5 th percentile is a lower bound. 

Appendix B: The Relationship Between BMSs and MUSs 

Let C be a satisfiable SAT instance and {h,h, ■ ■ ■ ,h} be the set of all literals entailed by 
C. Let d be the clause V -1/2 V ... V 

Theorem C" is a BMS of C iff C A d is an MUS of C A d □ 
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Proof Suppose C is a BMS of C. Then C A d, which is a sub-instance of C Ad, must 
be unsatisfiable, as d violates every literal in the backbone of C . lid is removed from C" Ac?, 
the result C is satisfiable. If any other clause c is removed from C A d, there must be some 
literal from the backbone of C", Zj say, such that (C — {c}) A -7$ is satisfiable. Therefore, 
since -i/j is also a literal of d, (C — {c}) A d is satisfiable. Therefore C" A d is an MUS of 
C Ad. 

Conversely, suppose C A d is an MUS of C A d. Since C A d is minimally unsatisfiable, 
C" is satisfiable. Since C is a sub-instance of C, the backbone of C" must be a subset of the 
backbone of C. Suppose there were some literal lj which was in the backbone of C but not 
in the backbone of C. Then there would be a solution to C A This would then also be 
a solution to C A d, since is one literal of d. This contradicts C Ad being unsatisfiable 
and so there can be no lj i.e. C and C must have the same backbone. 

C A d is minimally unsatisfiable. Therefore for any clause c of C", (C — {c}) A d is 
satisfiable. Any solution to (C — {c}) A d must make some literal of d true, and must 
therefore also be a solution to (C — {c}) A -iZ&. Therefore Z&, which is in the backbone of 
C", is not in the backbone of (C - {c}). Hence C is a BMS of C □ 
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